The search functionality is under construction.

Author Search Result

[Author] Nozomu TOGAWA(97hit)

61-80hit(97hit)

  • A Multi-Scenario High-Level Synthesis Algorithm for Variation-Tolerant Floorplan-Driven Design

    Koki IGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:7
      Page(s):
    1278-1293

    In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a multi-scenario high-level synthesis algorithm for variation-tolerant floorplan-driven design targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. At that time, we can explicitly take into account interconnection delays by using distributed-register architectures. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

  • Area and Delay Estimation in Hardware/Software Cosynthesis for Digital Signal Processor Cores

    Nozomu TOGAWA  Yoshiharu KATAOKA  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Hardware/Software Codesign

      Vol:
    E84-A No:11
      Page(s):
    2639-2647

    Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2 ns when comparing estimated area and delay with logic-synthesized area and delay.

  • A Unified Test Compression Technique for Scan Stimulus and Unknown Masking Data with No Test Loss

    Youhua SHI  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E91-A No:12
      Page(s):
    3514-3523

    This paper presents a unified test compression technique for scan stimulus and unknown masking data with seamless integration of test generation, test compression and all unknown response masking for high quality manufacturing test cost reduction. Unlike prior test compression methods, the proposed approach considers the unknown responses during test pattern generation procedure, and then selectively encodes the less specified bits (either 1s or 0s) in each scan slice for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed test scheme could dramatically reduce test data volume as well as the number of required test channels by using only c tester channels to drive N internal scan chains, where c = 「 log 2N 」 + 2. In addition, because all the unknown responses could be exactly masked before entering into the response compactor, test loss due to unknown responses would be eliminated. Experimental results on both benchmark circuits and larger designs indicated the effectiveness of the proposed technique.

  • A Two-Level Cache Design Space Exploration System for Embedded Applications

    Nobuaki TOJO  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3238-3247

    Recently, two-level cache, L1 cache and L2 cache, is commonly used in a processor. Particularly in an embedded system whereby a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. An optimal two-level cache configuration can be obtained which minimizes overall memory access time or memory energy consumption by varying the three cache parameters: the number of sets, a line size, and an associativity, for L1 cache and L2 cache. In this paper, we first extend the L1 cache simulation algorithm so that we can explore two-level cache configuration. Second, we propose two-level cache design space exploration algorithms: CRCB-T1 and CRCB-T2, each of which is based on applying Cache Inclusion Property to two-level cache configuration. Each of the proposed algorithms realizes exact cache simulation but decreases the number of cache hit/miss judgments by a factor of several thousands. Experimental results show that, by using our approach, the number of cache hit/miss judgments required to optimize a cache configurations is reduced to 1/50-1/5500 compared to the exhaustive approach. As a result, our proposed approach totally runs an average of 1398.25 times faster compared to the exhaustive approach. Our proposed cache simulation approach achieves the world fastest two-level cache design space exploration.

  • A High-Level Energy-Optimizing Algorithm for System VLSIs Based on Area/Time/Power Estimation

    Shinichi NODA  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-High Level Synthesis

      Vol:
    E85-A No:12
      Page(s):
    2655-2666

    This paper proposes a high-level energy-optimizing algorithm which can synthesize low energy system VLSIs. Given an initial system hardware obtained from an abstract behavioral description, the proposed algorithm applies to it the three energy reduction techniques, 1) reducing supply voltage, 2) selecting lower energy modules, and 3) applying gated clocks. By incorporating our area/delay/power estimation, the proposed algorithm can obtain low energy system VLSIs meeting the constraints of area, delay, and execution time. The proposed algorithm has been incorporated into a high-level synthesis system and experimental results demonstrate effectiveness and efficiency of the algorithm.

  • Trojan-Net Classification for Gate-Level Hardware Design Utilizing Boundary Net Structures

    Kento HASEGAWA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    LETTER-Network and System Security

      Pubricized:
    2020/03/19
      Vol:
    E103-D No:7
      Page(s):
    1618-1622

    Cybersecurity has become a serious concern in our daily lives. The malicious functions inserted into hardware devices have been well known as hardware Trojans. In this letter, we propose a hardware-Trojan classification method at gate-level netlists utilizing boundary net structures. We first use a machine-learning-based hardware-Trojan detection method and classify the nets in a given netlist into a set of normal nets and a set of Trojan nets. Based on the classification results, we investigate the net structures around the boundary between normal nets and Trojan nets, and extract the features of the nets mistakenly identified to be normal nets or Trojan nets. Finally, based on the extracted features of the boundary nets, we again classify the nets in a given netlist into a set of normal nets and a set of Trojan nets. The experimental results demonstrate that our proposed method outperforms an existing machine-learning-based hardware-Trojan detection method in terms of its true positive rate.

  • A Capacitance Measurement Device for Running Hardware Devices and Its Evaluations

    Makoto NISHIZAWA  Kento HASEGAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E103-A No:9
      Page(s):
    1018-1027

    In IoT (Internet-of-Things) era, the number and variety of hardware devices becomes continuously increasing. Several IoT devices are utilized at infrastructure equipments. How to maintain such IoT devices is a serious concern. Capacitance measurement is one of the powerful ways to detect anomalous states in the structure of the hardware devices. Particularly, measuring capacitance while the hardware device is running is a major challenge but no such researches proposed so far. This paper proposes a capacitance measuring device which measures device capacitance in operation. We firstly combine the AC (alternating current) voltage signal with the DC (direct current) supply voltage signal and generates the fluctuating signal. We supply the fluctuating signal to the target device instead of supplying the DC supply voltage. By effectively filtering the observed current in the target device, the filtered current can be proportional to the capacitance value and thus we can measure the target device capacitance even when it is running. We have implemented the proposed capacitance measuring device on the printed wiring board with the size of 95mm × 70mm and evaluated power consumption and accuracy of the capacitance measurement. The experimental results demonstrate that power consumption of the proposed capacitance measuring device is reduced by 65% in low-power mode from measuring mode and proposed device successfully measured capacitance in 0.002μF resolution.

  • An Ising Machine-Based Solver for Visiting-Route Recommendation Problems in Amusement Parks

    Yosuke MUKASA  Tomoya WAKAIZUMI  Shu TANAKA  Nozomu TOGAWA  

     
    PAPER-Computer System

      Pubricized:
    2021/07/08
      Vol:
    E104-D No:10
      Page(s):
    1592-1600

    In an amusement park, an attraction-visiting route considering the waiting time and traveling time improves visitors' satisfaction and experience. We focus on Ising machines to solve the problem, which are recently expected to solve combinatorial optimization problems at high speed by mapping the problems to Ising models or quadratic unconstrained binary optimization (QUBO) models. We propose a mapping of the visiting-route recommendation problem in amusement parks to a QUBO model for solving it using Ising machines. By using an actual Ising machine, we could obtain feasible solutions one order of magnitude faster with almost the same accuracy as the simulated annealing method for the visiting-route recommendation problem.

  • A Multiple Cyclic-Route Generation Method with Route Length Constraint Considering Point-of-Interests

    Tensei NISHIMURA  Kazuaki ISHIKAWA  Toshinori TAKAYAMA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-Intelligent Transport System

      Vol:
    E102-A No:4
      Page(s):
    641-653

    With the spread of map applications, route generation has become a familiar function. Most of route generation methods search a route from a starting point to a destination point with the shortest time or shortest length, but more enjoyable route generation is recently focused on. Particularly, cyclic-route generation for strolling requires to suggest to a user more than one route passing through several POIs (Point-of-Interests), to satisfy the user's preferences as much as possible. In this paper, we propose a multiple cyclic-route generation method with a route length constraint considering POIs. Firstly, our proposed method finds out a set of reference points based on the route length constraint. Secondly, we search a non-cyclic route from one reference point to the next one and finally generate a cyclic route by connecting these non-cyclic routes. Compared with previous methods, our proposed method generates a cyclic route closer to the route length constraint, reduces the number of the same points passing through by approximately 80%, and increases the number of POIs passed approximately 1.49 times.

  • A Locality-Aware Hybrid NoC Configuration Algorithm Utilizing the Communication Volume among IP Cores

    Seungju LEE  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E95-A No:9
      Page(s):
    1538-1549

    Network-on-chip (NoC) architectures have emerged as a promising solution to the lack of scalability in multi-processor systems-on-chips (MPSoCs). With the explosive growth in the usage of multimedia applications, it is expected that NoC serves as a multimedia server supporting multi-class services. In this paper, we propose a configuration algorithm for a hybrid bus-NoC architecture together with simulation results. Our target architecture is a hybrid bus-NoC architecture, called busmesh NoC, which is a generalized version of a hybrid NoC with local buses. In our BMNoC configuration algorithm, cores which have a heavy communication volume between them are mapped in a cluster node (CN) and connected by a local bus. CNs can have communication with each other via edge switches (ESes) and mesh routers (MRs). With this hierarchical communication network, our proposed algorithm can improve the latency as compared with conventional methods. Several realistic applications applied to our algorithm illustrate the better performance than earlier studies and feasibility of our proposed algorithm.

  • A Fast Elliptic Curve Cryptosystem LSI Embedding Word-Based Montgomery Multiplier

    Jumpei UCHIDA  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-System LSIs and Microprocessors

      Vol:
    E89-C No:3
      Page(s):
    243-249

    Elliptic curve cryptosystems are expected to be a next standard of public-key cryptosystems. A security level of elliptic curve cryptosystems depends on a difficulty of a discrete logarithm problem on elliptic curves. The security level of a elliptic curve cryptosystem which has a public-key of 160-bit is equivalent to that of a RSA system which has a public-key of 1024-bit. We propose an elliptic curve cryptosystem LSI architecture embedding word-based Montgomery multipliers. A Montgomery multiplication is an efficient method for a finite field multiplication. We can design a scalable architecture for an elliptic curve cryptosystem by selecting structure of word-based Montgomery multipliers. Experimental results demonstrate effectiveness and efficiency of the proposed architecture. In the hardware evaluation using 0.18 µm CMOS library, the high-speed design using 126 Kgates with 208-bit multipliers achieved operation times of 3.6 ms for a 160-bit point multiplication.

  • A High-Level Synthesis Algorithm with Inter-Island Distance Based Operation Chainings for RDR Architectures

    Kotaro TERADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1366-1375

    In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.

  • A Floorplan-Driven High-Level Synthesis Algorithm for Multiplexer Reduction Targeting FPGA Designs

    Koichi FUJIWARA  Kazushi KAWAMURA  Shin-ya ABE  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1392-1405

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-driven HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distributed-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 47% and latency by up to 22% compared with conventional approaches while the number of required control steps is almost the same.

  • ECC-Based Bit-Write Reduction Code Generation for Non-Volatile Memory

    Masashi TAWADA  Shinji KIMURA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2494-2504

    Non-volatile memory has many advantages such as high density and low leakage power but it consumes larger writing energy than SRAM. It is quite necessary to reduce writing energy in non-volatile memory design. In this paper, we propose write-reduction codes based on error correcting codes and reduce writing energy in non-volatile memory by decreasing the number of writing bits. When a data is written into a memory cell, we do not write it directly but encode it into a codeword. In our write-reduction codes, every data corresponds to an information vector in an error-correcting code and an information vector corresponds not to a single codeword but a set of write-reduction codewords. Given a writing data and current memory bits, we can deterministically select a particular write-reduction codeword corresponding to the data to be written, where the maximum number of flipped bits are theoretically minimized. Then the number of writing bits into memory cells will also be minimized. Experimental results demonstrate that we have achieved writing-bits reduction by an average of 51% and energy reduction by an average of 33% compared to non-encoded memory.

  • Scan-Based Side-Channel Attack on the Camellia Block Cipher Using Scan Signatures

    Huiqian JIANG  Mika FUJISHIRO  Hirokazu KODERA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E98-A No:12
      Page(s):
    2547-2555

    Camellia is a block cipher jointly developed by Mitsubishi and NTT of Japan. It is designed suitable for both software and hardware implementations. One of the design-for-test techniques using scan chains is called scan-path test, in which testers can observe and control the registers inside the LSI chip directly in order to check if the LSI chip correctly operates or not. Recently, a scan-based side-channel attack is reported which retrieves the secret information from the cryptosystem using scan chains. In this paper, we propose a scan-based attack method on the Camellia cipher using scan signatures. Our proposed method is based on the equivalent transformation of the Camellia algorithm and the possible key candidate reduction in order to retrieve the secret key. Experimental results show that our proposed method sucessfully retrieved its 128-bit secret key using 960 plaintexts even if the scan chain includes the Camellia cipher and other circuits and also sucessfully retrieves its secret key on the SASEBO-GII board, which is a side-channel attack standard evaluation board.

  • Maple: A Simultaneous Technology Mapping, Placement, and Global Routing Algorithm for Field-Programmable Gate Arrays

    Nozomu TOGAWA  Masao SATO  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E77-A No:12
      Page(s):
    2028-2038

    Technology mapping algorithms for LUT (Look Up Table) based FPGAs have been proposed to transfer a Boolean network into logic-blocks. However, since those algorithms take no layout information into account, they do not always lead to excellent results. In this paper, a simultaneous technology mapping, placement and global routing algorithm for FPGAs, Maple, is presented. Maple is an extended version of a simultaneous placement and global routing algorithm for FPGAs, which is based on recursive partition of layout regions and block sets. Maple inherits its basic process and executes the technology mapping simultaneously in each recursive process. Therefore, the mapping can be done with the placement and global routing information. Experimental results for some benchmark circuits demonstrate its efficiency and effectiveness.

  • An FPGA Layout Reconfiguration Algorithm Based on Global Routes for Engineering Changes in System Design Specifications

    Nozomu TOGAWA  Kayoko HAGI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E81-A No:5
      Page(s):
    873-884

    Rapid system prototyping is one of the main applications for field-programmable gate arrays (FPGAs). At the stage of rapid system prototyping, design specifications can often be changed since they cannot be determined completely. In this paper, layout design change is focused on and a layout reconfiguration algorithm is proposed for FPGAs. The target FPGA architecture is developed for transport processing. In order to implement more various circuits flexibly, it has three-input lookup tables (LUTs) as minimum logic cells. Since its logic granularity is finer than that of conventional FPGAs, it requires more routing resources to connect them and minimization of routing congestion is indispensable. In layout reconfiguration, the main problem is to add LUTs to initial layouts. Our algorithm consists of two steps: For given placement and global routing of LUTs, in Step 1 an added LUT is placed with allowing that the position of the added LUT may overlap that of a preplaced LUT; Then in Step 2 preplaced LUTs are moved to their adjacent positions so that the overlap of the LUT positions can be resolved. Global routes are updated corresponding to reconfiguration of placement. The algorithm keeps routing congestion small by evaluating global routes directly both in Steps 1 and 2. Especially in Step 2, if the minimum number of preplaced LUTs are moved to their adjacent positions, our algorithm minimizes routing congestion. Experimental results demonstrate that, if the number of added LUTs is at most 20% of the number of initial LUTs, our algorithm generates the reconfigured layouts whose routing congestion is as small as that obtained by executing a conventional placement and global routing algorithm. Run time of our algorithm is within approximately one second.

  • A Fast Scheduling Algorithm Based on Gradual Time-Frame Reduction for Datapath Synthesis

    Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E81-A No:6
      Page(s):
    1231-1241

    This paper proposes a fast scheduling algorithm based on gradual time-frame reduction for datapath synthesis of digital signal processing hardwares. The objective of the algorithm is to minimize the costs for functional units and registers and to maximize connectivity under given computation time and initiation interval. Incorporating the connectivity in a scheduling stage can reduce multiplexer counts in resource binding. The algorithm maximizes connectivity with maintaining low time complexity and obtains datapath designs with totally small hardware costs in the high-level synthesis environment. The algorithm also resolves inter-iteration data dependencies and thus realizes pipelined datapaths. The experimental results demonstrate that the proposed algorithm reduces the multiplexer counts after resource binding with maintaining low costs for functional units and registers compared with eight conventional schedulers.

  • A Hardware-Trojans Identifying Method Based on Trojan Net Scoring at Gate-Level Netlists

    Masaru OYA  Youhua SHI  Noritaka YAMASHITA  Toshihiko OKAMURA  Yukiyasu TSUNOO  Satoshi GOTO  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E98-A No:12
      Page(s):
    2537-2546

    Outsourcing IC design and fabrication is one of the effective solutions to reduce design cost but it may cause severe security risks. Particularly, malicious outside vendors may implement Hardware Trojans (HTs) on ICs. When we focus on IC design phase, we cannot assume an HT-free netlist or a Golden netlist and it is too difficult to identify whether a given netlist is HT-free or not. In this paper, we propose a score-based hardware-trojans identifying method at gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but it detects a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks and ISCAS85 benchmarks as well as HT-free and HT-inserted AES gate-level netlists. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be “HT-inserted” and all the HT-free gate-level benchmarks to be “HT-free” in approximately three hours for each benchmark.

  • Hardware-Trojans Rank: Quantitative Evaluation of Security Threats at Gate-Level Netlists by Pattern Matching

    Masaru OYA  Noritaka YAMASHITA  Toshihiko OKAMURA  Yukiyasu TSUNOO  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2335-2347

    Since digital ICs are often designed and fabricated by third parties at any phases today, we must eliminate risks that malicious attackers may implement Hardware Trojans (HTs) on them. In particular, they can easily insert HTs during design phase. This paper proposes an HT rank which is a new quantitative analysis criterion against HTs at gate-level netlists. We have carefully analyzed all the gate-level netlists in Trust-HUB benchmark suite and found out several Trojan net features in them. Then we design the three types of Trojan points: feature point, count point, and location point. By assigning these points to every net and summing up them, we have the maximum Trojan point in a gate-level netlist. This point gives our HT rank. The HT rank can be calculated just by net features and we do not perform any logic simulation nor random test. When all the gate-level netlists in Trust-HUB, ISCAS85, ISCAS89 and ITC99 benchmark suites as well as several OpenCores designs, HT-free and HT-inserted AES netlists are ranked by our HT rank, we can completely distinguish HT-inserted ones (which HT rank is ten or more) from HT-free ones (which HT rank is nine or less). The HT rank is the world-first quantitative criterion which distinguishes HT-inserted netlists from HT-free ones in all the gate-level netlists in Trust-HUB, ISCAS85, ISCAS89, and ITC99.

61-80hit(97hit)